Data description

Number of subjects per survey year

with(NDNS, tab1(SurveyYear, graph = FALSE, decimal = 2))
## SurveyYear : 
##             Frequency Percent Cum. percent
## NDNS Year 1       801   13.01        13.01
## NDNS Year 2       812   13.19        26.21
## NDNS Year 3       782   12.71        38.91
## NDNS Year 4      1055   17.14        56.05
## NDNS Year 5       625   10.15        66.21
## NDNS Year 6       663   10.77        76.98
## NDNS Year 7       703   11.42        88.40
## NDNS Year 8       714   11.60       100.00
##   Total          6155  100.00       100.00

Number of subjects per servey year by gender where Men == 1, Women == 2

with(NDNS, tabpct(SurveyYear, Sex,  graph = FALSE, decimal = 2))
## 
## Original table 
##              Sex
## SurveyYear        1     2  Total
##   NDNS Year 1   336   465    801
##   NDNS Year 2   350   462    812
##   NDNS Year 3   339   443    782
##   NDNS Year 4   418   637   1055
##   NDNS Year 5   249   376    625
##   NDNS Year 6   254   409    663
##   NDNS Year 7   321   382    703
##   NDNS Year 8   270   444    714
##   Total        2537  3618   6155
## 
## Row percent 
##              Sex
## SurveyYear          1       2  Total
##   NDNS Year 1     336     465    801
##                (41.9)  (58.1)  (100)
##   NDNS Year 2     350     462    812
##                (43.1)  (56.9)  (100)
##   NDNS Year 3     339     443    782
##                (43.4)  (56.6)  (100)
##   NDNS Year 4     418     637   1055
##                (39.6)  (60.4)  (100)
##   NDNS Year 5     249     376    625
##                (39.8)  (60.2)  (100)
##   NDNS Year 6     254     409    663
##                (38.3)  (61.7)  (100)
##   NDNS Year 7     321     382    703
##                (45.7)  (54.3)  (100)
##   NDNS Year 8     270     444    714
##                (37.8)  (62.2)  (100)
## 
## Column percent 
##              Sex
## SurveyYear        1        %     2        %
##   NDNS Year 1   336  (13.24)   465  (12.85)
##   NDNS Year 2   350  (13.80)   462  (12.77)
##   NDNS Year 3   339  (13.36)   443  (12.24)
##   NDNS Year 4   418  (16.48)   637  (17.61)
##   NDNS Year 5   249   (9.81)   376  (10.39)
##   NDNS Year 6   254  (10.01)   409  (11.30)
##   NDNS Year 7   321  (12.65)   382  (10.56)
##   NDNS Year 8   270  (10.64)   444  (12.27)
##   Total        2537    (100)  3618    (100)

Summary of their age

NDNS %>% 
  group_by(SurveyYear, Sex) %>% 
  summarise(N = n(), MeanAge = mean(Age), SDAge = sd(Age), minAge = min(Age), maxAge = max(Age))
## # A tibble: 16 x 7
## # Groups:   SurveyYear [?]
##    SurveyYear  Sex       N MeanAge SDAge minAge maxAge
##    <chr>       <chr> <int>   <dbl> <dbl>  <dbl>  <dbl>
##  1 NDNS Year 1 1       336    49.9  17.3     19     86
##  2 NDNS Year 1 2       465    49.2  17.8     19     94
##  3 NDNS Year 2 1       350    48.9  17.3     19     96
##  4 NDNS Year 2 2       462    50.2  17.9     19     92
##  5 NDNS Year 3 1       339    48.5  16.9     19     87
##  6 NDNS Year 3 2       443    49.6  18.0     19     93
##  7 NDNS Year 4 1       418    51.4  17.0     19     90
##  8 NDNS Year 4 2       637    48.8  17.0     19     94
##  9 NDNS Year 5 1       249    51.7  16.6     19     93
## 10 NDNS Year 5 2       376    49.4  17.8     19     92
## 11 NDNS Year 6 1       254    51.1  17.7     19     93
## 12 NDNS Year 6 2       409    49.0  18.1     19     95
## 13 NDNS Year 7 1       321    51.5  18.3     19     92
## 14 NDNS Year 7 2       382    49.8  18.3     19     89
## 15 NDNS Year 8 1       270    50.4  16.8     19     90
## 16 NDNS Year 8 2       444    49.9  17.7     19     94

Distribution of the dietary data (by Day of Week)

Day1

rm(list=ls(all=TRUE))
load("~/Documents/LSHTMproject/Rcode/NDNSday1_4.Rdata")
with(dta_d1_wide, tab1(DayofWeek, graph = FALSE, decimal = 2))
## DayofWeek : 
##           Frequency Percent Cum. percent
## Monday          726   11.80        11.80
## Tuesday         847   13.77        25.56
## Wednesday       814   13.23        38.79
## Thursday       1082   17.58        56.38
## Friday         1013   16.46        72.84
## Saturday        848   13.78        86.62
## Sunday          823   13.38       100.00
##   Total        6153  100.00       100.00

Day2

with(dta_d2_wide, tab1(DayofWeek, graph = FALSE, decimal = 2))
## DayofWeek : 
##           Frequency Percent Cum. percent
## Monday          822   13.36        13.36
## Tuesday         727   11.82        25.17
## Wednesday       848   13.78        38.96
## Thursday        812   13.20        52.15
## Friday         1081   17.57        69.72
## Saturday       1015   16.50        86.22
## Sunday          848   13.78       100.00
##   Total        6153  100.00       100.00

Day3

with(dta_d3_wide, tab1(DayofWeek, graph = FALSE, decimal = 2))
## DayofWeek : 
##           Frequency Percent Cum. percent
## Monday          846   13.75        13.75
## Tuesday         824   13.40        27.15
## Wednesday       725   11.79        38.94
## Thursday        850   13.82        52.76
## Friday          810   13.17        65.92
## Saturday       1080   17.56        83.48
## Sunday         1016   16.52       100.00
##   Total        6151  100.00       100.00

Day4

with(dta_d4_wide, tab1(DayofWeek, graph = FALSE, decimal = 2))
## DayofWeek : 
##           Frequency Percent Cum. percent
## Monday          994   16.50        16.50
## Tuesday         833   13.82        30.32
## Wednesday       811   13.46        43.78
## Thursday        705   11.70        55.48
## Friday          830   13.77        69.25
## Saturday        792   13.14        82.39
## Sunday         1061   17.61       100.00
##   Total        6026  100.00       100.00

Distribution of the dietary data (by DayNo)

Monday

rm(list=ls(all=TRUE))
load("~/Documents/LSHTMproject/Rcode/NDNSMon_Sun.Rdata")
with(dta_Mon_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1             726   21.43        21.43
## 2             822   24.26        45.69
## 3             846   24.97        70.66
## 4             994   29.34       100.00
##   Total      3388  100.00       100.00

Tuesday

with(dta_Tue_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1             847   26.21        26.21
## 2             727   22.50        48.72
## 3             824   25.50        74.22
## 4             833   25.78       100.00
##   Total      3231  100.00       100.00

Wednesday

with(dta_Wed_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1             814   25.45        25.45
## 2             848   26.52        51.97
## 3             725   22.67        74.64
## 4             811   25.36       100.00
##   Total      3198  100.00       100.00

Thursday

with(dta_Thurs_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1            1082   31.37        31.37
## 2             812   23.54        54.91
## 3             850   24.64        79.56
## 4             705   20.44       100.00
##   Total      3449  100.00       100.00

Friday

with(dta_Fri_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1            1013   27.13        27.13
## 2            1081   28.95        56.08
## 3             810   21.69        77.77
## 4             830   22.23       100.00
##   Total      3734  100.00       100.00

Saturday

with(dta_Sat_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1             848   22.70        22.70
## 2            1015   27.18        49.88
## 3            1080   28.92        78.80
## 4             792   21.20       100.00
##   Total      3735  100.00       100.00

Sunday

with(dta_Sun_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1             823   21.96        21.96
## 2             848   22.63        44.58
## 3            1016   27.11        71.69
## 4            1061   28.31       100.00
##   Total      3748  100.00       100.00

The problem of analysing data by day of the week would be that every subject only contributed 3-4 days’ data. Therefore, we cannot see one subject’s dietary change during a Mon-Sun week.

Dietary data by day

rm(list=ls(all=TRUE))
load("~/Documents/LSHTMproject/Rcode/NDNSday1_4.Rdata")

vecid <- unique(dfs3$id)

vecid1<-unique(dta_day1$id) # n = 6153
vecid2<-unique(dta_day2$id) # n = 6153
vecid3<-unique(dta_day3$id) # n = 6151
vecid4<-unique(dta_day4$id) # n = 6026


setdiff(vecid, vecid1) # two subjects did not have day 1 data
## [1] 50506161 70908241
setdiff(vecid, vecid2) # two subjects did not have day 2 data
## [1] 31012251 40714261
setdiff(vecid, vecid3) # four subjects did not have day 3 data
## [1] 10914251 11205071 80702191 81210131
setdiff(vecid, vecid4) # 129 subjects did not have day 4 data
##   [1] 10112011 10701161 10702161 10707261 10906181 10910111 10914251
##   [8] 20106041 20116171 20202081 20205081 20301211 20307041 20405101
##  [15] 20509211 20602011 20615041 21002101 21011041 21107031 21113041
##  [22] 21211041 21211101 30113231 30205131 30205201 30402131 30404081
##  [29] 30411081 30417081 30603071 30605131 30609131 30708201 30709031
##  [36] 30906071 30906201 30907251 30912021 31110201 40101011 40104021
##  [43] 40109221 40116011 40214081 40221221 40315101 40402221 40410251
##  [50] 40504211 40506221 40516021 40710081 40710101 40714251 40714261
##  [57] 40803081 40803221 40808081 40814131 40816011 40902051 40904021
##  [64] 41012081 41016131 41202051 50104191 50105161 50306241 50310271
##  [71] 50501271 50504271 50710161 51002141 51002191 51004011 51102241
##  [78] 51203191 51205141 51208041 51209071 60202081 60202261 60206161
##  [85] 60310131 60313021 60405161 60508071 60606271 60808161 60909271
##  [92] 61013261 61102251 61109081 70113191 70302241 70305031 70309241
##  [99] 70311181 70311251 70407251 70613181 70703181 70714181 70802241
## [106] 70812251 70815241 71101061 71206191 80108061 80301061 80301281
## [113] 80302191 80308251 80312241 80405181 80405281 80410131 80611131
## [120] 80713281 80805191 81002251 81004251 81005191 81007061 81101221
## [127] 81110061 81110131 81203221

LCA analyses by day

Day 1

set.seed(01012)
max_II <- 100000
lc1<-poLCA(f, data=dta_d1_wide, nclass=1, na.rm = FALSE, nrep=20, maxiter=max_II) #Loglinear independence model.
lc2<-poLCA(f, data=dta_d1_wide, nclass=2, na.rm = FALSE, nrep=20, maxiter=max_II)
lc3<-poLCA(f, data=dta_d1_wide, nclass=3, na.rm = FALSE, nrep=20, maxiter=max_II)
lc4<-poLCA(f, data=dta_d1_wide, nclass=4, na.rm = FALSE, nrep=20, maxiter=max_II) 
lc5<-poLCA(f, data=dta_d1_wide, nclass=5, na.rm = FALSE, nrep=20, maxiter=max_II)
lc6<-poLCA(f, data=dta_d1_wide, nclass=6, na.rm = FALSE, nrep=20, maxiter=max_II)
lc7<-poLCA(f, data=dta_d1_wide, nclass=7, na.rm = FALSE, nrep=20, maxiter=max_II)
lc8<-poLCA(f, data=dta_d1_wide, nclass=8, na.rm = FALSE, nrep=20, maxiter=max_II)

Model comparison and selection

Model Comparison. (Day 1, n = 6153)
N of Class log-likelihood resid. df AIC BIC aBIC cAIC likelihood-ratio Entropy
1 -100728.42 6081 201600.8 202085.0 201856.2 202157.0 94737.22 —
2 -99875.26 6008 200040.5 201015.6 200554.8 201160.6 93030.90 0.772
3 -99389.52 5935 199215.0 200681.0 199988.3 200899.0 92059.41 0.72
4 -99041.30 5862 198664.6 200621.5 199696.8 200912.5 91362.99 0.712
5 -98720.35 5789 198168.7 200616.5 199459.8 200980.5 90721.08 0.669
6 -98389.48 5716 197653.0 200591.7 199203.0 201028.7 90059.34 0.725
7 -98140.81 5643 197301.6 200731.2 199110.6 201241.2 89561.99 0.738
8 -97939.32 5570 197044.6 200965.1 199112.5 201548.1 89159.01 0.716
Note:
Abbreviation: N, number; resid.df, residual degree of freedom; AIC, Akaike information criterion; BIC, Bayesian information criterion; aBIC, adjusted BIC; cAIC, consistent AIC; Entropy, a pseudo-r-squared index.

Day 1 data (Fig. 4 classes)

Day 1 data (Fig. 6 classes)

Day 2

set.seed(01012)
max_II <- 100000
lc1<-poLCA(f, data=dta_d1_wide, nclass=1, na.rm = FALSE, nrep=20, maxiter=max_II) #Loglinear independence model.
lc2<-poLCA(f, data=dta_d1_wide, nclass=2, na.rm = FALSE, nrep=20, maxiter=max_II)
lc3<-poLCA(f, data=dta_d1_wide, nclass=3, na.rm = FALSE, nrep=20, maxiter=max_II)
lc4<-poLCA(f, data=dta_d1_wide, nclass=4, na.rm = FALSE, nrep=20, maxiter=max_II) 
lc5<-poLCA(f, data=dta_d1_wide, nclass=5, na.rm = FALSE, nrep=20, maxiter=max_II)
lc6<-poLCA(f, data=dta_d1_wide, nclass=6, na.rm = FALSE, nrep=20, maxiter=max_II)
lc7<-poLCA(f, data=dta_d1_wide, nclass=7, na.rm = FALSE, nrep=20, maxiter=max_II)
lc8<-poLCA(f, data=dta_d1_wide, nclass=8, na.rm = FALSE, nrep=20, maxiter=max_II)

# running time:  31.60514 mins

Model comparison and selection

Model Comparison. (Day 2, n = 6153)
N of Class log-likelihood resid. df AIC BIC aBIC cAIC likelihood-ratio Entropy
1 -99402.14 6081 198948.3 199432.5 199203.7 199504.5 92165.61 —
2 -98500.62 6008 197291.2 198266.3 197805.5 198411.3 90362.56 0.793
3 -97974.85 5935 196385.7 197851.7 197158.9 198069.7 89311.03 0.709
4 -97569.50 5862 195721.0 197677.9 196753.2 197968.9 88500.33 0.744
5 -97273.15 5789 195274.3 197722.1 196565.4 198086.1 87907.63 0.71
6 -96994.87 5716 194863.7 197802.4 196413.8 198239.4 87351.06 0.67
7 -96729.28 5643 194478.6 197908.2 196287.5 198418.2 86819.89 0.684
8 -96506.13 5570 194178.3 198098.8 196246.1 198681.8 86373.58 0.703
Note:
Abbreviation: N, number; resid.df, residual degree of freedom; AIC, Akaike information criterion; BIC, Bayesian information criterion; aBIC, adjusted BIC; cAIC, consistent AIC; Entropy, a pseudo-r-squared index.

Day 2 data (Fig. 4 classes)

Day 2 data (Fig. 6 classes)

Day 3

set.seed(01012)
max_II <- 100000
lc1 <- poLCA(f, data=dta_d3_wide, nclass=1, na.rm = FALSE, nrep=20, maxiter=max_II) #Loglinear independence model.
lc2 <- poLCA(f, data=dta_d3_wide, nclass=2, na.rm = FALSE, nrep=20, maxiter=max_II)
lc3 <- poLCA(f, data=dta_d3_wide, nclass=3, na.rm = FALSE, nrep=20, maxiter=max_II)
lc4 <- poLCA(f, data=dta_d3_wide, nclass=4, na.rm = FALSE, nrep=20, maxiter=max_II) 
lc5 <- poLCA(f, data=dta_d3_wide, nclass=5, na.rm = FALSE, nrep=20, maxiter=max_II)
lc6 <- poLCA(f, data=dta_d3_wide, nclass=6, na.rm = FALSE, nrep=20, maxiter=max_II)
lc7 <- poLCA(f, data=dta_d3_wide, nclass=7, na.rm = FALSE, nrep=20, maxiter=max_II)
lc8 <- poLCA(f, data=dta_d3_wide, nclass=8, na.rm = FALSE, nrep=20, maxiter=max_II)

# running time:  31.65244 mins

Model comparison and selection

Model Comparison. (Day 3, n = 6151)
N of Class log-likelihood resid. df AIC BIC aBIC cAIC likelihood-ratio Entropy
1 -98064.40 6079 196272.8 196756.9 196528.1 196828.9 89662.66 —
2 -97182.50 6006 194655.0 195630.0 195169.3 195775.0 87898.86 0.807
3 -96657.05 5933 193750.1 195216.0 194523.3 195434.0 86847.97 0.751
4 -96275.67 5860 193133.3 195090.1 194165.4 195381.1 86085.21 0.776
5 -95935.70 5787 192599.4 195047.1 193890.4 195411.1 85405.26 0.729
6 -95646.71 5714 192167.4 195106.0 193717.3 195543.0 84827.27 0.725
7 -95387.65 5641 191795.3 195224.7 193604.1 195734.7 84309.16 0.706
8 -95185.53 5568 191537.1 195457.4 193604.7 196040.4 83904.92 0.74
Note:
Abbreviation: N, number; resid.df, residual degree of freedom; AIC, Akaike information criterion; BIC, Bayesian information criterion; aBIC, adjusted BIC; cAIC, consistent AIC; Entropy, a pseudo-r-squared index.

Day 3 data (Fig. 4 classes)

Day 3 data (Fig. 6 classes)

Day 4

set.seed(01012)
max_II <- 100000
lc1<-poLCA(f, data=dta_d4_wide, nclass=1, na.rm = FALSE, nrep=20, maxiter=max_II) #Loglinear independence model.
lc2<-poLCA(f, data=dta_d4_wide, nclass=2, na.rm = FALSE, nrep=20, maxiter=max_II)
lc3<-poLCA(f, data=dta_d4_wide, nclass=3, na.rm = FALSE, nrep=20, maxiter=max_II)
lc4<-poLCA(f, data=dta_d4_wide, nclass=4, na.rm = FALSE, nrep=20, maxiter=max_II) 
lc5<-poLCA(f, data=dta_d4_wide, nclass=5, na.rm = FALSE, nrep=20, maxiter=max_II)
lc6<-poLCA(f, data=dta_d4_wide, nclass=6, na.rm = FALSE, nrep=20, maxiter=max_II)
lc7<-poLCA(f, data=dta_d4_wide, nclass=7, na.rm = FALSE, nrep=20, maxiter=max_II)
lc8<-poLCA(f, data=dta_d4_wide, nclass=8, na.rm = FALSE, nrep=20, maxiter=max_II)

# running time:  27.12666 mins

Model comparison and selection

Model Comparison. (Day 4, n = 6026)
N of Class log-likelihood resid. df AIC BIC aBIC cAIC likelihood-ratio Entropy
1 -94315.04 5954 188774.1 189256.7 189028.0 189328.7 84646.60 —
2 -93523.25 5881 187336.5 188308.6 187847.8 188453.6 83063.02 0.793
3 -92963.27 5808 186362.5 187824.0 187131.2 188042.0 81943.06 0.763
4 -92555.29 5735 185692.6 187643.4 186718.7 187934.4 81127.11 0.716
5 -92238.14 5662 185204.3 187644.5 186487.8 188008.5 80492.81 0.69
6 -91954.80 5589 184783.6 187713.2 186324.5 188150.2 79926.12 0.741
7 -91705.51 5516 184431.0 187850.0 186229.4 188360.0 79427.56 0.711
8 -91526.79 5443 184219.6 188127.9 186275.3 188710.9 79070.10 0.757
Note:
Abbreviation: N, number; resid.df, residual degree of freedom; AIC, Akaike information criterion; BIC, Bayesian information criterion; aBIC, adjusted BIC; cAIC, consistent AIC; Entropy, a pseudo-r-squared index.

Day 4 data (Fig. 4 classes)

Day 4 data (Fig. 6 classes)